Imbalanced Dataset Classification and Solutions: a Review
نویسندگان
چکیده
-Imbalanced data set problem occurs in classification, where the number of instances of one class is much lower than the instances of the other classes. The main challenge in imbalance problem is that the small classes are often more useful, but standard classifiers tend to be weighed down by the huge classes and ignore the tiny ones. In machine learning the imbalanced datasets has become a critical problem and also usually found in many applications such as detection of fraudulent calls, bio-medical, engineering, remote-sensing, computer society and manufacturing industries. In order to overcome the problems several approaches have been proposed. In this paper a study on Imbalanced dataset problem and the solution is given.
منابع مشابه
Proposing a Novel Cost Sensitive Imbalanced Classification Method based on Hybrid of New Fuzzy Cost Assigning Approaches, Fuzzy Clustering and Evolutionary Algorithms
In this paper, a new hybrid methodology is introduced to design a cost-sensitive fuzzy rule-based classification system. A novel cost metric is proposed based on the combination of three different concepts: Entropy, Gini index and DKM criterion. In order to calculate the effective cost of patterns, a hybrid of fuzzy c-means clustering and particle swarm optimization algorithm is utilized. This ...
متن کاملImproving Imbalanced data classification accuracy by using Fuzzy Similarity Measure and subtractive clustering
Classification is an one of the important parts of data mining and knowledge discovery. In most cases, the data that is utilized to used to training the clusters is not well distributed. This inappropriate distribution occurs when one class has a large number of samples but while the number of other class samples is naturally inherently low. In general, the methods of solving this kind of prob...
متن کاملEnhancing Learning from Imbalanced Classes via Data Preprocessing: A Data-Driven Application in Metabolomics Data Mining
This paper presents a data mining application in metabolomics. It aims at building an enhanced machine learning classifier that can be used for diagnosing cachexia syndrome and identifying its involved biomarkers. To achieve this goal, a data-driven analysis is carried out using a public dataset consisting of 1H-NMR metabolite profile. This dataset suffers from the problem of imbalanced classes...
متن کاملAn insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics
Training classifiers with datasets which suffer of imbalanced class distributions is an important problem in data mining. This issue occurs when the number of examples representing the class of interest is much lower than the ones of the other classes. Its presence in many real-world applications has brought along a growth of attention from researchers. We shortly review the many issues in mach...
متن کاملFirst study of the behaviour of genetic fuzzy classifier based on low quality data respect to the preprocessing of low quality imbalanced datasets
There are real-world dataset where we can found classes with a very different percentage of patterns between them, that is to say we have classes represented by many examples (high percentage of patterns) and classes represented by few examples (low percentage of patterns). These kind of datasets receive the name of “imbalanced datasets”. In the field of classification problems the imbalanced d...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014